Food insecurity is one of the most significant environmental justice challenges in the United States, with more than 42 million people. Approximately 10.5% of US households experience some form of food insecurity. Hunger in America has been exacerbated by the COVID-19 pandemic, impacting families already facing hunger the most. Before the pandemic, more than 12 million children lived in food-insecure households, with that number now increasing to 13 million. BIPOC communities face the highest rates of starvation and hunger in the nation. 11.5% of people identify as food insecure within the Bay Area, with only 38% of them qualifying for food stamps. There are many ways to quantify food insecurity but access to healthy foods has been recognized as of greatest importance.

In determining the areas vulnerable to low access to healthy foods, three measures were used to create an index: household income, Hispanic/Latino ethnicity, and eligibility for SNAP. The USDA has developed a food access database that presents data by census tract for measures of supermarket accessibility. We aim to compare Alameda County, one of the areas facing greatest food insecurity in the Bay, with San Francisco County. Both are equally urban and densely populated areas but have drastically different food health and food access issues.

Is there a statistical correlation between race, SNAP eligibility, and food access? What is the relationship between race and SNAP eligibility? What is the relationship between food access and income? What is the relationship between cardiovascular health and income level? Through those questions, we will draw conclusions between race, SNAP, health metrics, and income. We chose SNAP because it sits at the intersection of food and income in a single variable. Also, we acknowledge that this is not an exclusively urban problem (there is much evidence of food insecurity in rural areas), however, the urban setting exacerbates a lot of the issues detailed above.



SNAP Eligibility per County in the Bay Area (grouped by county eligibility)

We found that Alameda County, Santa Clara, Contra Cost, and San Francisco have the highest number of qualifying households in the Bay Area. Moving on to our equity analysis, we will choose to narrow down to just Alameda County and San Francisco county because of their shared urban density and their differing food health and food access issues which may make them the most interesting to compare.



Equity Analysis of SNAP Eligibility by Race

Compared the totals, the proportion of white people qualifying for SNAP decreased in both counties, the proportion of Black or African American increased in both counties. In San Francisco, the proportion of Asian people qualifying for SNAP increased slightly, whereas in Alameda county it decreased significantly. Some other race alone, native Hawaiian, American Indian and Alaska Native alone, and two or more races increased in both counties. This is not suprising, and the breakdown follows national trends (proportion of white being greatest, then Black/African American, then Hispanic and Asian). Due to our findings, we will be using Black or African American as our focus racial group from now on (health effects only). Though our results would more likely be different if we included ethnicity, for the purpose of this analysis, we will just be concentrating on race.



Correlation between building type, SNAP allocation, income and tenure (by PUMAs)

## 
## Call:
## glm(formula = allocated ~ building + tenure + kitchen + puma, 
##     family = quasibinomial(), data = bay_pums_factored)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.7395  -0.1308  -0.0925  -0.0804   3.8029  
## 
## Coefficients: (2 not defined because of singularities)
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -1.99476    0.16780 -11.888  < 2e-16 ***
## building1     -0.53764    1.15369  -0.466 0.641208    
## building2     -2.38299    1.08377  -2.199 0.027901 *  
## building3     -2.39846    1.11907  -2.143 0.032102 *  
## building4     -1.52385    1.09828  -1.387 0.165305    
## building5     -1.96203    1.11530  -1.759 0.078556 .  
## building6     -1.74665    1.11336  -1.569 0.116705    
## building7     -2.14812    1.13571  -1.891 0.058577 .  
## building8     -1.96011    1.11344  -1.760 0.078349 .  
## building9     -1.90332    1.08751  -1.750 0.080104 .  
## building10   -16.64537 2442.90475  -0.007 0.994563    
## tenure1       -1.94449    0.42379  -4.588 4.49e-06 ***
## tenure2       -1.74122    0.45605  -3.818 0.000135 ***
## tenure3       -1.37416    0.40777  -3.370 0.000753 ***
## tenure4             NA         NA      NA       NA    
## kitchen1       0.62022    1.01353   0.612 0.540583    
## kitchen2            NA         NA      NA       NA    
## puma00102      0.37462    0.25820   1.451 0.146813    
## puma00103      0.25043    0.29779   0.841 0.400380    
## puma00104      0.83790    0.28209   2.970 0.002978 ** 
## puma00105     -0.03149    0.33310  -0.095 0.924676    
## puma00106     -0.09903    0.33120  -0.299 0.764945    
## puma00107      0.75253    0.24647   3.053 0.002267 ** 
## puma00108     -0.31020    0.40300  -0.770 0.441467    
## puma00109      0.02390    0.31199   0.077 0.938939    
## puma00110     -0.03119    0.29856  -0.104 0.916798    
## puma07501      0.04377    0.28349   0.154 0.877296    
## puma07502     -0.29094    0.35408  -0.822 0.411265    
## puma07503     -1.03416    0.41854  -2.471 0.013485 *  
## puma07504    -14.84549  294.06285  -0.050 0.959737    
## puma07505     -1.21286    0.53230  -2.279 0.022704 *  
## puma07506     -2.57843    1.01483  -2.541 0.011067 *  
## puma07507      0.08109    0.37183   0.218 0.827358    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasibinomial family taken to be 0.9956621)
## 
##     Null deviance: 3008.2  on 25787  degrees of freedom
## Residual deviance: 2448.1  on 25757  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 18

Results from logit model: Building Type: No strong correlation between Building Type and SNAP Allocation + Income. Tenure: Although not drastically different, renters are mroe likely to be allocated SNAP than owners.
Kitchen: Outcome 1 signifies “has a kitchen” however, the results are statistically insignificant. Clearly, this factor displays no correlation and disproves our initial hypothesis that there would be a strong correlation. Outcome 2 means “has no kitchen” and thus makes sense to result in NA as there is no data.
PUMA: While all other PUMAs seem to be fairly unbiased, PUMA 07504 in San Francisco County corresponding to The Mission, Castro, Duboce Triangle and Haight Ashbury, have a strongly negative correlation with SNAP Allocation. Thus, very few people in the area have income below 66000/year and are eligible for SNAP.



CalEnviroScreen: correlating Cardiovascular Health and Poverty in Alameda and San Francisco Counties

This graph shows there is a notable difference in Cardiovascular health between San Francisco and Alameda County. Especially, the San Leandro area in Hayward with a score of 21.04.

In comparison to the previous map, this map shows a much more even distributed distribution of poverty households in each county. There are equally as low or high poverty levels in both areas.

Scatter plot does not show a clear relationship, there are several outliars and the points themselves almost appear to be random.

## 
## Call:
## lm(formula = Poverty ~ `Cardiovascular Disease`, data = bay_cardio_poverty_tract)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -23.261 -10.497  -3.829   6.105  60.538 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               13.4087     1.8709   7.167 2.49e-12 ***
## `Cardiovascular Disease`   0.8202     0.1667   4.920 1.15e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.74 on 549 degrees of freedom
## Multiple R-squared:  0.04223,    Adjusted R-squared:  0.04048 
## F-statistic:  24.2 on 1 and 549 DF,  p-value: 1.147e-06

As you can see, an increase of Cardiovascular Disease in one unit is associated with an increase of Poverty in 13.4; 4.2% of the variation in Cardiovascular Disease is explained by the variation in Poverty. The p-value of 1.147e-06 is <5% making these results statistically significant.

The graph above is a representation of the distribution of residuals from our model. While the peak is fairly close to 0, it is skewed the left and not evenly distributed on both sides. Thus, we will try to create a logarithmic verison of our model to try to normalize the distribution.

The scatter plot still does not show a clear relationship as there are several outliers. The biggest difference between this scatter plot and the previous one is that the y axis has a smaller range.

## 
## Call:
## lm(formula = log(`Cardiovascular Disease`) ~ Poverty, data = bay_cardio_poverty_tract)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.11942 -0.24454  0.01079  0.22975  0.78557 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 2.1918798  0.0263256  83.261  < 2e-16 ***
## Poverty     0.0047181  0.0009856   4.787 2.18e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3477 on 549 degrees of freedom
## Multiple R-squared:  0.04007,    Adjusted R-squared:  0.03832 
## F-statistic: 22.92 on 1 and 549 DF,  p-value: 2.18e-06

In this case, an increase of Cardiovascular Disease in one unit is associated with an increase of Poverty in 2.19; 4.0% of the variation in Cardiovascular Disease is explained by the variation in Poverty. The p-value of 2.18e-06 is <5% making these results statistically significant.

While the results with log are marginally better, they are still not fully normalized but allow us to draw some conclusions with caution. It shows that the data has a relatively normal distribution and so, that there is some sort of correlation between Cardiovascular Disease and Poverty.

The lowest residuals are concentrated around the San Francisco area while the highest residuals are in Alameda County. This shows that the actual data for San Francisco is similar to the results from our model regression. On the other hand, Alameda County, and especially the shoreline communities, have really skewed data meaning our regression results were significantly different from the actual data.



Equity Analysis of Individuals within a 10-mile range of a grocery store (food desert)



The USDA defines food deserts as both low income areas and ones in which more than a third of the pop at the census tract level lives over a mile from a grocery store or supermarket (10 miles for rural areas). Below is a map of the low income and low access tracts measured at one mile (given all urban areas).

Clearly, Alameda has a drastically larger overall population, making this graph limited. Thus, we have decided to plot a second graph showing % instead.

This graph however, is very informative and shows that within both San Francisco and Alameda County the race group facing the least food access issues are white people. Next however, in San Francisco is the Black or African American community while in Alameda it is the Asian community. In third, those communities are flipped for the two counties and the only other significant race category is Two or more Races.



Reflection
In further work, perhaps next quarter, it would be very interesting to plot the individual grocery stores on a map and layer our equity analysis on top of that to see a really clear relationship between race and food access. Our hypothesis that there was a relationship between food access and race was mostly supported by our analyses, so it would be great to further this exploration with more tools next quarter.

This project gave us the opportunity to delve deeper into a serious issue within the Bay Area using our fall quarter tool kit. Though much of our analysis was pretty surface level, we were still able to create meaningful results with statistical significance.